An analysis of transcription consistency in spontaneous speech from the buckeye corpus
نویسندگان
چکیده
We present a preliminary analysis of transcriber consistency in labeling and segmentation of words and phones in the Buckeye corpus of spontaneous, informal speech. We find that pairwise inter-transcriber agreement on exact phone label match was 76%, and segmentation agreement within 20% of phone pair length was 75%, though longer phones are more consistently segmented than shorter phones. Patterns of consistency variation in labeling are observed as a function of phonetic categories that are similar to patterns reported for read speech. More agreement is seen on consonants than on vowels, and on fricatives and labials than on other consonant classes. In general, we find that shorter, more reduced words and phones result in more transcriber disagreement.
منابع مشابه
An analysis of coding consistency in the transcription of spontaneous speech from the Buckeye corpus
متن کامل
The Buckeye corpus of conversational speech: labeling conventions and a test of transcriber reliability
This paper describes the Buckeye corpus of spontaneous American English speech, a 307,000-word corpus containing the speech of 40 talkers from central Ohio, USA. The method used to elicit and record the speech is described, followed by a description of the protocol that was developed to phonemically label what talkers said. The results of a test of labeling consistency are then presented. The c...
متن کاملUnderstanding VOT Variation in Spontaneous Speech
This paper reports a corpus study on the variation of VOT in voiceless stops in spontaneous speech. Two speakers’ data from the Buckeye corpus are used: one is an older female speaker with a low speaking rate while the other is a younger male speaker with an extremely high speaking rate. Linear regression analysis shows that place of articulation, word frequency, phonetic context, speech rate a...
متن کاملThe buckeye corpus of speech: updates and enhancements
This paper describes recent progress in the development of the Buckeye Corpus of Speech, a phonetically labeled corpus of conversational American English speech, first described in [1]. With the publication of the second phase of transcription, the corpus has nearly doubled in size from the first release. We briefly give an overview of the corpus, report on additional studies of inter-labeler a...
متن کاملImproving transcription agreement of non-native English speech corpus transcribed by non-natives
This paper proposes an economical and effective phonetic transcription method for dealing with a large amount of nonnative English speech corpus. The method provides a consistent transcription agreement, although the corpus is transcribed by non-natives. To minimize the possibility of confusion in transcription process, forced aligned phone sequences and a set of possible mispronunciation candi...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2002